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Abstract 

We consider a model of computation motivated by possible limitations on quantum 
computers. We have a linear array of n wires, and we may perform operations only on 
pairs of adjacent wires. Our goal is to build a circuits that perform specified operations 
spanning all n wires. We show that the natural lower bound of n — 1 on circuit depth is 
nearly tight for a variety of problems, and we prove linear upper bounds for additional 
' problems. In particular, using only gates adding a wire (mod 2) into an adjacent wire, 

OS ■ we can realize any linear operation in GL n (2) as a circuit of depth 5n. We show that 

some linear operations require depth at least 2n + 1. 

o 
t> 

1 Introduction 

We consider the following model of computation: We have n wires, labeled (1) through (n). 



Each wire carries a single bit. We are allowed to perform reversible linear operations on 



adjacent wires: (i) ©= (i + 1) or (i) ©= (i — 1). We assume throughout that n is at least 2. 
qh Our goal is to perform some calculation spanning all n wires; for example, we might 

want to set (n) ©= (1) and leave the other n — 2 wires unchanged. Our primary measure of 
complexity is the depth of a circuit; we will also consider the size of the circuit (that is, the 
number of gates). 

The motivation for this problem is quantum circuit design. In some proposed models of 
quantum computation [SHH [TIE], we can perform operations only on adjacent bits, so it is 
important to consider the cost of computing with bits separated by a given distance. Since 
the eventual topology of quantum computers is unknown, we choose to focus on linear arrays 
of bits. Results here should at least be applicable to other topologies. 

We note that our model is wholly classical; there are no quantum operations. To perform 
a quantum gate, one could first move bits around using classical operations and then apply 
the quantum gate to adjacent bits. We discuss the cost of this approach in Section [3TTT . 

It is often helpful to take an algebraic view of these circuit problems. We adopt the 
convention that the wires of our circuit contain column vectors, and we describe the state 
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of all of the wires by the matrix whose ith column is the contents of wire (i) . A cnot gate 
adds the vector on one wire into the vector on another wire. Any circuit performs a series 
of column operations; note that circuits act on the right. 

Any function on n bits that can be built out of additions may be viewed as an element 
of the group GL n (2) oi n x n invertible matrices over the field F2 of two elements. A single 
gate is represented by an elementary matrix consisting of the identity matrix with a single 
1 either just above or just below the main diagonal. These matrices generate the group, so 
we can build any reversible linear operation on our wires using these gatesQ 

It is not hard to show that any element of GL n (2) can be constructed from 0(n 2 ) gates, 
that product of 0(n 2 ) of the above generators. A simple counting argument gives 

a lower bound of Q(n 2 / logn) for generic circuits. In Section [73, we give a lower bound of 
(1 — o(l))n 2 for generic elements of GL n (2). 

Our primary complexity measure is depth, rather than size, so the generating set of 
interest is different. We allow any set of Is just off the diagonal, as long as all the row and 
column indices are distinct; we cannot have two gates using the same wire at the same time. 
All of our questions can be rephrased in this setting: What is the shortest product of these 
generators equal to a particular element of the group? 

We label the wires by (1) through (n) and their initial values by a± through a n . In our 
diagrams, we draw the wires horizontally, with time proceeding from left to right, wire (1) 
at the top, and wire (n) at the bottom. We analyze the costs of the following problems: 

Add Perform (n) = a\ © a n ; for each other i, leave (i) = aj. 

Swap Set (n) = a x and (1) = a n ; for each other i, leave (i) = cii. 

Rotate Set (n) = a\\ for each i < n, set (i) = cii + \. 

Reverse Set (n + 1 — i) = aj for each i. 

Permute Set (cr(i)) = cii for each i, given some a G S n . 

Compute Apply an arbitrary M e GL„(2) to the n wires. 

The first two tasks require us to perform an operation on (1) and (n), leaving the other 
bits untouched. The next three tasks require us to reorder the bits; this might be useful if a 
quantum circuit will perform complex calculations on different subsets of the bits. The final 
task encompasses any possible linear computation. 

It is immediate that each of these tasks requires depth n — 1, since we need to move the 
information in a\ at least n — 1 timesJl We encourage the reader to work out low-depth 
solutions to the above problems before reading further. 

We will prove the following results. In each case, our proof is via an explicit construction. 

1 To implement reversible affine operations, we would need to allow unary negation gates as well. All such 
negations could be deferred to one final time-slice. 

2 For permutation and arbitrary computation, this lower bound applies in the worst case. 
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Theorem 1.1. We can add across n wires in depth n + 4. 

Theorem 1.2. We can swap across n wires in depth n + 8. 

Our swapping circuit works by moving a± and a n to two adjacent wires in depth roughly 
n/2, swapping the values, and then moving the wires back. Instead of swapping the values, 
we could apply any 2-qubit gate to the two wires. So, we can apply any 2-qubit quantum gate 
spanning n wires in depth n + 0(1). In Section I3~T| we will generalize the above argument. 
We can apply any m-qubit gate whose total span is at most n in depth n + 0(m). 

Theorem 1.3. We can rotate n wires in depth n + 5. 

Theorem 1.4. We can reverse n wires in depth In + 2. 

We will show in Section l5\2l that reversal requires a depth of at least 2n + 1. 

Theorem 1.5. For any o G S n , there is a circuit implementing a of depth at most 3n. 

Theorem 1.6. For any M e GL n (2) ; there is a circuit implementing M of depth at most 5n. 

We will show in Section [73 that, for any e > 0, almost every matrix in GL n (2) requires 
depth at least (2 — e)n. One natural problem is to close the gap between this lower bound 
and the upper bound of 5n. We discuss this, and other open questions, in Section [HI 

2 Addition 

Theorem 2.1. We can add across n wires in depth n + 3 for even n and in depth n + 4 for 
odd n. The circuit has size 4n — 7. 

An example of the construction for n — 10 appears in Figured! 

Proof. Let k = \n/2\. We will construct a subcircuit of depth k + 1 and size 2n — 4 that 
has the following effects: 

1. (k) = ai. 

2. a n contributes only to wire (k + 1). 

Next, we perform (k + 1) © = (k); this just replaces a n by a n © a\ in the only location where 
a n appears. Finally, we undo the subcircuit. When we are done, we have (n) = a n © ax, 
and each other wire has its initial value. The overall circuit size is 4n — 7, and the depth is 
2k + 3 as desired. 

It remains only to discuss the subcircuit, which is described in Figure [2J We begin with 
the first two loops, or "cascades" . The first loop writes © a i+ i to (i) for i < k. After the 
second loop, (i) contains ai © a i+ i for i < k, and (k) contains ax. Notice that we can start 
the second loop during the third time-slice, so the two cascades together have depth k + 1. 

The third and fourth loops can be similarly analyzed. After both loops are completed, 
we have written a^i to (i) (for i > k + 1) and ®J =A;+1 aj to (k + 1). As desired, a n affects 
only (k + 1). The depth is(n-l-Jfe)+2<jfe + l. □ 
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Figure 1: Addition across 10 wires (k = 5) in depth 13. The central CNOT is shown in red. 



for 


i = 1 to k — 1: 




(i) ©= (i + 1) 


for 


i — 1 to — 1: 




(i + l)©=<i> 


for 


z = n — 1 down to k + 1: 




(i> ©= (« + 1) 


for 


z = n — 1 down to k + 1: 




(i + 1) ©= (z) 



Figure 2: Subcircuit for addition. We take k = \n/2]. 

3 Swap 

Theorem 3.1. We can swap across n wires in depth n + 7 for even n and in depth n + 8 
for odd n. The circuit has size 6n — 9. 

An example of this construction for n = 9 appears in Figure [31 

Proof. We use the same basic idea as in the proof of Theorem 12. 1[ As before, let k — \n/2~\. 
Before, we built a subcircuit guaranteeing that (k) = a\ and that a n contributes only to wire 
(k + 1). For a swap, we need something stronger: 

1. (k) = ai. 

2. (k + 1) = a n . 

3. No other wire depends on at or a n . 

Our subcircuit will have size 3n — 6 and depth k + 3. 

We begin by running the subcircuit. Next, we swap (k) and (k + 1); this requires three 
gates. Finally, we undo the subcircuit. The overall size is 6n — 9. 
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Figure 3: Swap across 9 wires (k = 5) in depth 17. The central swap is shown in red. 



for 


z = 1 to k — 1: 




(z) © = (i + 1) 


for 


z = 1 to A; — 1: 




(i + l)©=<i) 


for 


z = 1 to fc — 1: 




(i) © = (i + 1) 


for 


z = n — 1 down to k + 1: 




(i) © = (i + 1) 


for 


i = n — 1 down to fc + 1: 




(z + 1) © = (z) 


for 


i = n — 1 down to k + 1: 




(z) ©= (z + 1) 



Figure 4: Subcircuit for swap. We take k = \n/2\. 
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The subcircuit is described in Figure HI The first two loops are the same as in Figure [2j 
We write a\ + a i+ i to (i) (for i < k) and a\ to (k). The next loop erases the a\ information; 
when it concludes, we have (i) = a i+ i + a i+2 for i < k — 1, (k — 1) = a^, and (k) = a\. As 
before, we can nest the cascades (see Figure [3]); the depth is k + 3. 

The remaining loops are similar. After the penultimate loop, we have (i) = for 
i > k + 1 and (k + 1) = ©™ =fc+1 aj- The final loop accumulates upward; we obtain (i) = 
CLj for i > k + 1, and (k + 1) — a n . The depth is (n — 1 — k) + 4 < A; + 3. 

Since the subcircuit has depth k + 3, and the central swap has depth 3, one might think 
the overall depth would be 2k + 9. In fact, we can reduce the depth to 2k + 7. Two of the 
three gates in the swap commute with adjacent gates and can be nested into the subcircuit, 
as shown in Figure [H □ 



3.1 Arbitrary Quantum Gates 

As noted in the Introduction, we could replace the central swap with any operation on a\ 
and a n ; in the quantum setting, we could use any 2-qubit gate. Hence, any 2-qubit gate 
spanning n wires can be implemented in depth n + 0(1). 

Suppose that we wish to implement an m-qubit gate with span n. We need to operate 
on a set of bits (ii) , . . . , (i m ) with 1 = i x < i 2 < • • ■ < i m = n. Write = a ie . Let k = \n/2] 
as above, and choose j with ij < k < 

For each I between 1 and m, we will move hi onto the wire (k — j + £), so the bits 
will lie on m adjacent wires. We then perform the m-qubit gate. Finally, we undo the 
transformation. 

We will begin with nested cascades as in our swap circuit; we use the top half of the 
subcircuit of Figure HI but we only let i range from ij to k — 1. When we finish, we have 
(k) = bj, and no other wire depends on bj. The wires between (j) and (k — 1) contain some 
complicated functions of various a« bits, but none of the bg bits are involved. 

Next, if j > 1, we perform cascades moving bj^i to (k — 1). We continue, performing a 
series of j sets of cascades; the final set moves b\ into (k — j + 1). Since the cascades nest, 
the total depth is k + 0{m). 

At the same time, we perform upward cascades moving to (k + 1), bj +2 to (k + 2), 
and so on, up to moving b m to (k — j + m). After k + 0(m) time-slices, we have moved the 
m bits of interest onto the wires from (k — j + 1) to (k — j + m). 

Finally, we perform the m-qubit gate, and we reverse the first part of the computation to 
put all the bits back. The overall depth is n + 0(m), in addition to the cost of the m-qubit 
quantum gate. 

Moreover, suppose we wish to perform several long-range gates spanning n wires, and 
using a total of m bits, simultaneously. We first move those m bits together in depth 
n + 0{m). Next, we permute the bits in depth 0{m) (see Section [6]), so the bits for each gate 
are adjacent. We now perform the quantum gates and then undo the rest of the calculation. 
The total depth is again n + 0(m), in addition to the cost of the most complicated quantum 
gate. 
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4 Rotation 



Recall that rotating n wires means setting (n) to a\ and setting (i) to Oj+i for each other i. 
Theorem 4.1. For n > 2, we can rotate n wires in depth n + 5. The circuit has size 4n — 6. 
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Figure 5: Rotation of 10 wires (k = 5) in depth 15. 

We first give a rotation circuit of depth 2n + 1. We then explain how to use this circuit 
in our main construction. An example of the final result with n = 10 is depicted in Figure 

Lemma 4.2. We can rotate n wires in depth 2n + 1. The circuit has size An — 5. 

Proof. We consider the rotation circuit of Figure [6], which we call R(£,m). 

The first three loops of R(£, m) are similar to those in Figure HI After the first loop, we 
have (j) = ©i = £flj for £ < j < m. The second loop leaves (m) = ©™^cii, and sets each 
other (j) to The third loop sets (j) to dj for j < m, but sets (m) = a t . The 

final loop restores (j) to a J+ i for j < m. 

The circuit R(£, m) contains 4(m — I) — 1 gates. The first three loops can be nested, for 
a combined depth of (m — £) + 4. The total depth is 2(m — £) + 3. If we take £ = 1 and 
m = n, we obtain a rotation of all n wires. □ 



for 


i — £ to m — 1: 




©= (i) 


for 


i = ltom-l: 




(z) ©= (z + 1) 


for 


i = ltom-l: 




(i + 1) ©= (z) 


for 


i = m — 2 down to £: 




(i + 1) ©= (i) 



Figure 6: Rotation circuit R(£,m). 
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Note that if we flip each gate in R(£, m) upside-down, the resulting circuit still performs 
a rotation. More generally, the circuit formed by flipping each gate of a given circuit upside- 
down performs the inverse transpose of the GL n (2) transformation performed by the original 
circuit. 



Proof of Theorem J^.l. Let k = \n/2]. We let R(£,m) be the circuit of Lemma 



We let R'(£, m) be the circuit R(£ } m) run upside-down and backward. Note that running 
a rotation circuit upside-down makes it rotate in the opposite direction, and running any 
circuit backward makes it perform the inverse operation. So, R'(£,m) has the same effect 
as R(£, m). 

We define a circuit C as follows: 

1. Apply R(l,k). 

2. Apply R'(k,n). 

First, note that the first half of C sets (j) = a J+ i for j < k, and (k) = a\. Consequently, 
the second half of C completes the rotation. So, C rotates n wires as desired. Clearly the 
size of C is 4n — 6. 

The only bit used both by R(l, k) and R'(k, n) is (k). Note that R(l, k) is done accessing 
bit (k) after k + 3 time-slices, and R'(k, d) does not access (k) until time-slice n — k. Hence, 
the total depth of the circuit is {k + 3) + ((n — k) + 4) = n + 7. 

We can further reduce the depth to n + 5. The last access of (k) by R(l, k) and the first 
access by R'(k,n) both write to (A;). These two operations commute with each other. By 
swapping the order, we can start R'(k,n) two time-slices sooner. □ 

5 Reversal 

We now give a construction reversing the contents of n wires in depth 2n + 2. We then show 
that any such circuit has depth at least 2n + 1. 

5.1 Upper bound on reversal 

Theorem 5.1. We can reverse n wires in depth 2n + 2. The circuit has size n 2 — 1. 

An example of this construction for n = 9 appears in Figure [3 

Proof. The reversal circuit is described in Figure [H The subcircuit R adds the contents 
of each wire with an even index into its neighbors; the subcircuit R\ adds the odd-indexed 
wires into their neighbors. We alternate between these two operations. 

For a given value of i, we will keep track of which wires depend on a, over time. First 
suppose that i is even. To simplify matters, we will see what the effect of successive appli- 
cations of Rq and R± would be if there were wires corresponding to arbitrarily small and 
large integers. After the first application of R , since i is even, we perform (i — 1) ©= (i) 
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and (i + 1) ©= (i), so gets added to (i — 1) and (i + 1). Thus affects (i — 1), (i), and 
(i + 1). After we next apply R±, (i — 1) and (i + 1) are added to their neighboring wires, so 
aj affects (i — 2) through (i + 2). (The effects of the two additions to (i) cancel.) In general, 
after applying R Q and R\ a total of t times, a* will affect (i — t) through (i + t). 

Now let us take into account the fact that we only have wires (1) through (n). During the 
ith application of an i?-subcircuit, we cannot add (1) to (0), since the latter does not exist, 
so (1) is still the lowest-numbered wire affected by a^. During the (i + l)th application of an 
i?-sub circuit, (2) is added to (1), so (1) no longer depends on a,, and (2) is now the first wire 
affected by a,. Therefore, after t applications of .R-subcircuits, for t > i, the lowest-numbered 
wire affected by is (t — i + 1). Similarly, for t > n — i, the highest-numbered wire affected 
by a,i is (n — (t — 1 — (n — i))) = (2n — t — i + 1). (We can see this by interchanging i and 
n + 1 — i, relabeling the wires in the opposite order, and interchanging Rq and Ri if n is 
even.) That is, for t bigger than both % and n — i (and not too large), Oj will affect exactly 
the wires (t — i + 1) through (2n — t — i + 1) after t applications of i?-subcircuits. 

Our circuit applies i?-subcircuits a total of n + 1 times. After n of these, Oj affects 
exactly wires (n + 1 — i) through (n + 1 — i); that is, a« affects only (n + 1 — i). Since this 
nth application writes to wires of the opposite parity of (n + 1 — i), the (n + l)th application 
will write to wires of the same parity as (n + 1 — i), and (n + 1 — i) will still be the only 
wire affected by a«. 

Finally, we consider the case with i odd. After the first application of an i?-subcircuit, 
(i) is still the only wire affected by a«. Then, as above, after n more applications, (n + 1 — i) 
is the sole wire affected by aj. 

We have shown that, after our circuit runs, the wire (n + 1 — i) will depend on Oj, but 
no other wire (n + 1 — j) for j ^ i will. Turning this around, we see that the final value of 
(n + 1 — i) does not depend on aj for j ^ i, so that this final value must, in fact, be equal 
to di. We have performed reversal, as desired. □ 

For n — 2, the subcircuits Rq and R\ each have depth 1, so the overall depth of our 
reversal circuit is 3. For n > 2, the depth is 2n + 2. 
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Figure 7: Reversal of 9 wires in depth 20. 



9 



Rq: 

for i = 1 to [n/2\: 
(2i-l)e=(2i> 

for i = 1 to |_(n - 1)/2J: 
<2i + 1> © = 

for i = 1 to [n/2j: 
(2i) ©= (2i- 1) 

for i = 1 to [(n- 1)/2J: 
(2i)®=(2i + l) 



Reversal: 

for T = to n: 

if T is even: Apply Rq 
else: Apply i?i 



Figure 8: Reversal circuit. For n > 2, the subcircuits Rq and i?i each have depth 2. 
5.2 Lower bound on reversal 

For 2 < n < 6, computer searches confirm that the above construction is optimal. We 
conjecture that the depth of any circuit performing reversal for n > 3 is at least 2n + 2. We 
now show that any such circuit has depth at least 2n + 1. 

Lemma 5.2. For any k < n/2, any circuit reversing n wires contains at least 2k + 1 gates 
between wires (k) and (k + 1) and also between wires (n — k) and (n — k + 1). If k is not 
n/2, then there must be at least 2k + 1 such gates before the last time-slice. 

Proof. Let R be a circuit reversing (1) , . . . , (n). We show that R must have at least 2k + 1 
gates between wires (k) and (k + 1); the proof for (n — fc) and (n — k + 1) is analogous. 
We write the contents of the wires at any given time as a block matrix 



where W is k x fc, X is k x (n — k), Y is (n — k) x fc, and Z is (n — fc) x (n — k). The matrix 
M changes as we apply R. Initially, W and Z are identity matrices of sizes k and n — k, 
and X and Y are 0. When we conclude, W , X, Y, and Z have ranks 0, k, k, and ra — 2k, 
respectively. 

The ranks of W, X, Y , and Z are affected only by gates between wires (k) and (k + 1). 
Each upward gate (k) ©= (k + 1) changes the ranks of and Y by at most 1, and each 
downward gate (k + 1) ©= (fc) changes the ranks of X and Z by at most 1. Each of the four 
ranks has to change by k. We conclude that there are at least k upward and k downward 
gates in R. 

Furthermore, suppose that the first gate between (k) and (k + 1) is upward. At this 
point X is still 0, so the gate cannot affect the rank of W; the circuit R requires k more 




(1) 
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upward gates. Similarly, if the first gate is downward, it cannot affect the rank of Z, and R 
requires k additional downward gates. Hence, there must be at least 2k + 1 gates between 
(k) and (k + 1). 

Finally, if k is not exactly n/2, then any gate in the last time-slice cannot affect any 
of the ranks of W, X, Y, Z, so all of the gates accounted for above must occur in earlier 
time-slices. □ 



Theorem 5.3. Reversing n > 3 wires requires depth at least 2n+l and size at least [|^ 2 J + 



n. 



Proof. First, suppose n = 2r. Given any circuit R for reversal, we obtain another reversal 
circuit by vertically flipping the last time-slice of R (that is, conjugating it by reversal) 
and moving it to the beginning of the circuit. We may therefore assume, without loss of 
generality, that the last time-slice contains a gate between (r + 1) and (r + 2). 

By Lemma T5.21 there are at least 2r + 1 gates between wires (r) and (r + 1) and at least 
2(r — 1) + 1 = 2r — 1 gates between wires (r + 1) and (r + 2) before the last time-slice. 
Hence, there are at least 4r + 1 = 2n + 1 gates involving (r + 1), giving the lower bound on 
depth. If we sum over all locations, we find that the total number of gates is at least 



2r + l + 22(22 + l) + l = ^^. 

i=l 

Second, suppose n = 2r + 1. Again, we may assume that the last time-slice contains 
a gate between (r + 1) and (r + 2). Now we have at least 2r + 1 gates between wires (r) 
and (r + 1) and at least 2r + 2 gates between (r + 1) and (r + 2). This gives a total of 
4r + 3 = 2n + 1 gates involving (r + 1), meaning we must have at least 2n + 1 time-slices. 
The total number of gates is at least 

2jJ(2* + l) + l= . 

□ 



6 Permutation 

We now discuss the more general problem of permuting the n input bits. It is easier to 
visualize the problem by imagining that the wire (i) contains the data a« with the attached 
label o~(i). We then wish to sort the data by their labels. When we finish, the wire (i) will 
have the label i, and hence the bit o^im, as desired. 

Theorem 6.1. For any a G S n , there is a circuit implementing a with depth at most 3n 
and size at most 3(") . 
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Proof. It is convenient to pretend that our basic operation is a swap of two adjacent bits; 
we can implement such a swap using three of our standard gates. Initially, our labels are in 
the order cr(l), . . . , cx(n); after each swap, the order changes. When the circuit completes, 
we want the labels to be sorted. 

To effect the swaps, we use an n-bit sorting network. The basic gate is a conditional 
swap on (i) and (j): if i < j but the label on (i) is larger than the label on (j), then we 
swap the contents and labels of the two wires. A network of conditional swaps is a sorting 
network if, for any (valid) initial assignment of labels, at the end wire (i) has label i. We 
are interested in sorting networks using only conditional swaps on adjacent wires. See [3j 
Section 5.3.4] for more discussion. 

Suppose we have a n-bit sorting network of depth d and size s, in which each conditional 
swap is between two adjacent wires. We will perform each swap only if the label of the 
second bit is less than that of the first bit. Since we know a in advance, we know which 
swaps to leave in the network and which to leave out. The result will be a swap network 
with depth at most d and size at most s. The corresponding circuit has depth at most 3d 
and size at most 3s. 



<1>- 
(2>- 
(3>- 
(4>- 

<5>- 
(6)- 

<7>- 



-a 7 

-«6 
-as 
-04 
-a 3 
-a 2 
-at 



Figure 9: 7-wire sorting network in depth 7. 



It merely remains to construct an efficient sorting network using only conditional swaps of 
adjacent wires. We use the odd-even transposition sortjf) It has n steps, alternating between 
performing all conditional swaps of the form (2j — 1, 2j) and performing all conditional 
swaps of the form (2j, 2j + 1). An example with n = 7 is depicted in Figure [9j We have 
s = n(n — l)/2 and d = n (unless n = 2, in which case d = 1). □ 

We observe that the above sorting network achieves the optimal d and s. First, note that 
each swap reduces the number of inversions by at most one. Since a can have up to Q) 
inversions, we must have s > (™). 

In addition, in an optimal sorting network, we cannot perform the same swap in consec- 
utive time-slices. Thus, in any two consecutive time-slices, we can perform at most n — 1 
swaps. Hence, for all n > 2, we need at least [n/2\ pairs of time-slices to accommodate 
(n — 1) [n/2\ gates, plus (at least) one more time-slice if n is odd. Hence, for all n > 2 we 
have d > n. 

3 See Knuth [3l Exercise 5.3.4.37] for a proof of correctness and a brief history. 
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Clearly, for a particular permutation, we may be able to do better than Theorem 16 . 1 1 would 
suggest; see, for example, Sections El EH and El A more difficult problem is determining the 
minimum depth for the worst possible a. 

For n < 6, reversal is at least as hard as any other permutation: we can implement any 
permutation in depth 2n + 2. We do not know whether this pattern holds for larger n. 

7 Arbitrary Matrices 

As noted in the Introduction, any circuit on n wires made up of cnot gates computes a 
matrix in GL n (2). Conversely, given a matrix, it is straightforward to build a circuit with 
depth 0{n 2 ). 

More concretely, we suppose the initial state of the wires is described by the identity 
matrix J; each wire (i) contains the basis vector ej. If a circuit C applied to this initial state 
I results in state M, we say that C performs the transformation M. This map from circuits 
to matrices is a homomorphism. 

The problems of building a circuit performing M and a circuit performing M -1 , for 
an arbitrary invertible matrix M, are equivalent. Notationally, we find the latter more 
convenient. Instead of building a circuit to perform M, we suppose the wires start in state 
M, and we construct a circuit to "undo" M and restore I. The reverse of this circuit will 
perform M. 

In this section we give a constructive proof of the following result: 

Theorem 7.1. Let M be a matrix in GL n (2). Then there is a circuit computing M with 
depth at most 5n. 

Our construction uses the concept of a "northwest" -triangular matrix. 

Definition 7.2. An n x n matrix M is northwest-triangular if M^- = for all i + j > n + 1. 

We discuss the building blocks of our circuit in Section 17.11 In Sections 17.21 and 17.31 we 
prove the following propositions: 

Proposition 7.3. Let M be in GL n (2). Given an n-wire sorting network of depth d, we can 
construct a circuit C of depth 2d such that MC is northwest-triangular. 

Proposition 7.4. Let N be an invertible northwest-triangular matrix. Given an n-wire 
sorting network of depth d, we can construct a circuit R of depth 3d with NR = I . 

Proof of Theorem \ 7. 1\ Let M be any matrix in GL n (2). By Proposition 17.31 using the odd- 
even transposition network of depth n, there is a circuit C of depth 2n such that MC is 
northwest-triangular. By Proposition 17.41 (using the same network), there is a circuit R of 
depth 3n with MCR = I. Then R^C' 1 computes M. □ 
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The maximum possible size (that is, number of gates) of a depth-c? circuit is d \n/2\ . The 
density of a circuit is its size divided by this maximum. The construction of Theorem 17.11 
has size about |n 2 and density 1. We also have a construction with size about 2n 2 and 
density 1/2. (See Section [HI ) Note that, if we could construct a circuit with size 2n 2 and 
density 1, we would have a solution with depth An. We discuss this, and other reasons why 
we conjecture that circuits of depth An + 0(1) may be possible, in Section [HJ 

7.1 Boxes 

The building blocks for our circuits will be not individual cnot gates, but boxes: 

Definition 7.5. A box is a subcircuit on two adjacent wires (i) and (i + 1). 

Every box performs some operation in GL 2 (2). If u and v are the contents of the two input 
wires to a box, then the two output wires contain distinct elements of {u,v,u © v}. Some 
researchers (for example, [2]) compute the costs of quantum circuits by counting arbitrary 
2-qubit interactions; in such a model, the box is the fundamental unit. 

Table 1: Depth of implementing boxes with input u, v. 



First 
Output 
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Output 


Depth 
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u®v 
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u®v 
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u@v 
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u 
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The depth of a box depends on the two output vectors, as shown in Table [TJ If we want 
to perform an arbitrary operation in GL 2 (2), then the depth of our box could be as large 
as 3. However, if we only specify one of the two outputs, and allow the other output to take 
whichever value is more convenient, we see that we can make do with boxes of depth 2. 

7.2 Clearing Networks 

We now prove Proposition 17.31 We use a sorting network to build a system of depth-2 boxes 
to convert any matrix into northwest-triangular form. 



Proof of Proposition \7.3[ We first perform a lower-triangular basis change; this does not 



involve changing the contents of any wires, but merely describes them differently. We then 
construct a circuit. 
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Let V = F2 be the space containing our wires. We define a lexicographic order on V. 
For u, v G V, we write u -< v if there exists k such that u ■ = 0, v ■ = 1, and, for all 
j > k, u ■ ej = v ■ ej. 

For each i, let t>j be the lexicographically least element of aj©span{aj : i < j < n}. Note 
that this is a lower-triangular basis change: for each i, a\ G v i © span{t>j : i < j < n}. 

For each i, let be the smallest j such that 1^ • e n+1 _j = 1. By construction, 7r is a 
permutation: for any k < £, we must have v & -< t> & © u^, and therefore 7r(fc) 7^ 7r(-£). 

Let to, = v^-iij\. The 10 j satisfy 

w j G e n+ i_j © spanje/j : 1 < k < n + 1 — j}. 

Attach to each wire (i) the label ir(i). We maintain the following invariant: 

• If a wire has value Y2 a j w ji anc ^ ^ ^ s ^ e label on some lower-numbered wire, then 
a k = 0. 

The invariant is true initially because a j w j = Yl a ^(i) v i an d the basis change is lower- 
triangular. If we sort the labels and maintain the invariant, then when we are done, the 
value of wire (i) is in 

Wi © spanjifj : i < j < n} = e n +i-i © spanjej : 1 < j < n + 1 — i}, 

so the wires specify a northwest-triangular matrix. 

We now build a circuit C that sorts the labels while maintaining the invariant. We 
start with a sorting network S of depth d and replace each conditional swap in S by a box. 
Suppose we have two inputs to a box: (i) has value u and label j, and (i + 1) has value v 
and label k. If j < k, we do nothing. If j > k, we swap the two labels, and we also perform 
a box as described below. When the network concludes, we will have sorted the labels, as 
desired. 

Let W be the span of all W£ for I 7^ k. The space W has codimension 1, so at least one 
of {u, v,u(Bv} lies in W. We can perform a box on (i) and (i + 1) that writes a vector in W 
to wire (i + 1). This maintains the invariant for wires (i) and (i + 1), as desired, and other 
wires are unaffected. 

Each box in C comes from a conditional swap in S. We are specifying only one output 
of each box, so each box has depth at most 2. Hence, the depth of C is at most 2d. □ 

7.3 Reversal Networks 

We now prove Proposition 17.41 we reduce any northwest-triangular matrix to the identity. 
As before, we use a sorting network to build a system of boxes. However, in this case our 
boxes are permitted to have depth 3. 

Proof of Proposition \7.4\ We first label each input wire (z) with n + 1 — i. We take a 
sorting network S of depth d and convert S to a reversal network; we include exactly those 
conditional swaps that are used when input wire (z) has label n + 1 — i. (The new network 
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will have size (™) j if $ nas the minimal size ( ™) , then it already is a reversal network. ) We 
make each remaining swap unconditional: we definitely swap the two labels. 

Consider a swap between (i), with value u and label k, and + with value v and 
label j. Note that k > j. If u ■ e 3 ■ = 0, we replace the swap with a depth-3 box exchanging u 
and f . If U ■ ej = 1, then we replace the swap with the depth-2 box that first adds u into v 
and then adds u © v into u; this has the effect of replacing u by u © w and then exchanging 
(the new) u and v. 

We claim that this circuit maintains the following invariants: 

1. If u is on the wire with label k, then u ■ e& = 1 and u ■ = for £ > k. 

2. If u is on (i) with label fc, and (h) has label j, with h < i and j < k, then u ■ ej = 0. 

Initially, (i) has label n + 1 — i. The first invariant holds because N is an invertible 
northwest-triangular matrix. The second invariant holds vacuously, as there are no such 
pairs of wires. 

What is the effect of a single box between wires (i) and (i + 1) with values u and v and 
labels k and p. The box necessarily maintains the first invariant. Swapping u and v has no 
effect. The step replacing u by u © v also is not a problem: k > j implies v ■ = for all 
£ > k. 

This circuit also maintains the second invariant. It clearly still holds for all wires besides 
(i) and + The value v and label j move unchanged from wire (i + 1) to wire (i), so it 
holds for (i) as well. If label £ is on wire (h), with h < i and £ < k, then u ■ eg = 0. Also, 
■u • e £ = 0, either by the first invariant if £ > j, or by the second if £ < j, so (u © v) ■ eg = 0. 
Finally, we have designed the box so that the output value on + either u or u © v , is 
orthogonal to e,-. 

When i? concludes, the labels are in order; wire (i) has label i. The two invariants then 
imply that (i) contains e$; that is, we have reached the identity matrix. □ 

7.4 Lower Bounds 

By Theorem 15. 3[ reversal requires depth 2n + 1. Hence, we have already shown that the 
minimum depth for the worst-case matrix in GL n (2) is at least In + 1. We now argue that 
almost all invertible matrices require about this depth. By "almost all invertible matrices" 
we mean a proportion of elements of GL n (2) tending to 1 as n goes to oo. First we quote a 
well-known result on ranks of random matrices. 

Theorem 7.6. As n goes to oo, the proportion of n x n matrices over F2 having rank at 
most n — c is 0(2~ c ). 

Sketch of proof . This follows from the fact that the number ofnxn matrices of rank k is 
equal to the square of the number of n x k matrices of rank k divided by the number of 
invertible k x k matrices. To count these numbers of matrices, we use a standard formula of 
Landsberg [5]; see Stanley P, Section 1.3] for a more recent exposition. □ 
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Lemma 7.7. Let e > be given. For almost all matrices M in GL n (2), every circuit 
implementing M has, for each integer k with 1 < k < n/2, at least 2k — en gates between 
wires (k) and (k + 1) and also between wires (n — k) and (n — k + 1). 

Proof. The proof uses the same technique as that of Lemma 15.21 As before, we consider 
wires (k) and (k + 1). Choose M G GL n (2) uniformly at random, and consider a circuit 
implementing M. We write the contents of the wires at any time as a block matrix, as 
in (CQ). Initially, X and Y are 0, and at the conclusion of the circuit, X and Y are two blocks 
of our matrix M. For large enough values of n and for a random choice of M, we expect X 
and Y each to have rank at least k — (e/2)n; since each gate between (k) and (k + 1) changes 
the total rank of X and Y by at most 1, we have at least 2k — en such gates. □ 

Counting gates between different pairs of bits yields the following theorem: 

Theorem 7.8. Let e > be given. For almost all matrices M in GL n (2) every circuit 
implementing M requires depth at least (2 — e)n and size at least (1 — e)n 2 . 

A more careful analysis shows that the proportion of matrices in GL n (2) that can be 
implemented in depth at most 2n — m is 

8 Open Questions 

Let the "depth" of a matrix be the minimum depth of any circuit implementing the matrix. 
We have shown that the maximum depth of a matrix in GL n (2) lies between 2n+ 1 and 5n. 
A natural question is whether we can close this gap. 

For several reasons, the authors feel that the maximum depth may be only An + 0(1). 
First, we consider circuit size: By Theorem 15.31 reversal requires at least n 2 /2 gates, and 
the construction of Section [7] computes any matrix in at most 5n 2 /2 gates. Bob Beals [1] 
has shown that one can compute any matrix in only 2n 2 gates. If we could pack these gates 
into a rectangular array, we could implement the matrix in depth An. 

More precisely, let V be the set of all matrices implementable as V-shaped arrays of (™) 
depth-2 boxes. A circuit in V has size at most n 2 and depth at most An. Beals showed [1] 
that V 2 = GL n (2): given M, he builds two circuits, one on either side of M, so that the 
product is the identity. Let S be the set of all rectangular arrays of (™) depth-2 boxes; a 
circuit in S has size at most n 2 and depth at most 2n. If we could similarly construct circuits 
in H on either side of a matrix M to reduce it to the identity, then H 2 would equal GL n (2), 
and we could implement any matrix in depth An. 

By Proposition 17.31 we can use 5 to reduce any matrix to northwest-triangular form. It is 
interesting to note that the subgroup of upper (or lower) triangular matrices in GL n (2) has 
index riLil 2 ' ~ !). but its order is onl y U7=i ^ = 2^~ n ^ 2 . Thus, one could argue that we 
are "working harder" to reduce a general matrix to northwest triangular form (in depth 2n) 
than to reduce the triangular matrix to the identity (in depth 3n). One could imagine that 
the latter reduction should be possible in the same depth as the former, providing further 
evidence that S 2 might contain all of GL n (2). 
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We performed exhaustive computer experiments for n up to 6. The maximum depths are 
shown in Table [2J While we are reluctant to draw inferences from such limited data, these 
values suggest that the maximum depth may be as small as 2n + 0(1). In other words, the 
lower bound of Theorems 15.31 and 17.81 may be tight up to an additive constant. 

Table 2: Maximum depth of a matrix in GL n (2) for n < 6 obtained by exhaustive search. 
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We also give some additional open questions: 

• For addition, swap, and rotation, we have an upper bound for depth of n + 0(1) and 
a lower bound of n — 1. What is the correct additive constant? 

• For n > 3, the optimal depth for reversal is either 2n + l or 2n + 2. Which is correct? 

• What is the correct depth for a general permutation? For small n, reversal is at least 
as hard as any other permutation; does this hold for all n? 

• For general matrices, we have a lower bound on size of n 2 /2 and an upper bound of 
2n 2 pp. What is the correct answer? 

• As noted earlier, some researchers [21 H] use the box as the basic unit of computation; 
in this model, we have a lower bound on depth of n + 1 (for reversal) and an upper 
bound of 2n. What is the correct answer? 

• Are there other classes of operations that can be implemented efficiently in this model? 

The last question above is perhaps the most intriguing. Our focus was on selecting 
natural operations on n wires and then determining their depth. An alternative approach 
would be to consider all circuits of a given depth and see what other useful operations can 
be performed. Such an analysis might suggest new efficient circuits for arbitrary matrices 
and might even yield new approaches to quantum circuit design. 

Acknowledgments 

Tom Draper helped with our early work on addition, swap, and rotation. Bob Beals made 
many helpful suggestions and pointed out the generalization from Theorem 15.31 to Theo- 
rem EHl 



18 



References 

[1] Robert M. Beals. Private communication, 2004. 

[2] Austin G. Fowler, Simon J. Devitt, and Lloyd C. L. Hollenberg. Implementation of 
Shor's algorithm on a linear nearest neighbour qubit array. Quantum Information and 
Computation, 4(4):237-251, 2004. 

[3] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. 
Addison-Wesley, second edition, 1998. 

[4] Samuel A. Kutin. Shor's algorithm on a nearest-neighbor machine, quant-ph/0609001, 
2006. 

[5] G. Landsberg. Uber eine Anzahlbestimmung und eine damit zusammenhagende Reihe. 
J. Reme Angew. Math, 111:87-88, 1893. 

[6] Richard P. Stanley. Enumerative Combinatorics, volume 1. Cambridge University Press, 
1997. 

[7] Rodney Van Meter. Architecture of a quantum multicomputer optimized for Shor's fac- 
toring algorithm. PhD thesis, Keio University, 2006. Also quant-ph/0607065. 

[8] Rodney Van Meter and Kohei Itoh. Fast quantum modular exponentiation. Physics 
Review Letters A, 71:052320, 2005. 



19 



